traceAI-langchain
python package, which records detailed traces of the model’s reasoning, tool usage, and responses. These traces are then evaluated for quality aspects like completeness, groundedness, hallucination, and correct use of tools. Completeness ensures the answer fully addresses the user’s query, groundedness verifies that the response is based on retrieved evidence, hallucination detection flags unsupported or fabricated content, and tool usage eval checks whether the agent invokes external tools appropriately and integrates results correctly. Together, these metrics help developers build agents that are not only intelligent, but also reliable, explainable, and production-ready.
GOOGLE_API_KEY
and GOOGLE_CSE_ID
OPENAI_API_KEY
FI_API_KEY
and FI_SECRET_KEY
eval_tags
list contains multiple instances of EvalTag
. Each EvalTag
represents a specific evaluation configuration to be applied during runtime, encapsulating all necessary parameters for the evaluation process.
type
: Specifies the category of the evaluation tag. In this cookbook, EvalTagType.OBSERVATION_SPAN
is used.
value
: Defines the kind of operation the evaluation tag is concerned with.
EvalSpanKind.AGENT
indicates that the evaluation targets operations involving Agent.EvalSpanKind.TOOL
: For operations involving tools.eval_name
: The name of the evaluation to be performed.
config
: Dictionary for providing specific configurations for the evaluation. An empty dictionary means that default configuration parameters will be used.
mapping
: This dictionary maps the required inputs for the evaluation to specific attributes of the operation.
custom_eval_name
: A user-defined name for the specific evaluation instance.
Click here to learn more about the evals provided by Future AGI
trace_provider
, we need to pass following parameters to register
function:
project_type
: Specifies the type of project. Here, ProjectType.EXPERIMENT
is used since the evaluation setup is more inclined towards experimentation of finding and evaluating chatbot.project_name
: User-defined name of the project.project_version_name:
**The version name of the project to track different runs of experiment.eval_tags
: A list of evaluation tags that define specific evaluations to be applied.instrument
method is called on the LangChainInstrumentor
instance. This method is responsible for setting up the instrumentation of the LangChain framework using the provided tracer_provider
.
GoogleSearchAPIWrapper
. This tool acts as an external data source the agent can call when it needs current information. We then use ChatOpenAI
with the gpt-4o-mini
model and bind it to the search tool.
In LangGraph, each step in the agent’s logic is represented as a node in a graph. Each node handles a specific task, and the application moves from one node to another depending on the current state of the conversation. In our chatbot, we define three main nodes:
router
function then checks whether the agent has requested a tool. If it has, the flow moves to the tool node. If not, the agent proceeds directly to the final node to generate the response. This allows the agent to make decisions dynamically based on the query.
We then combine all the nodes into a complete graph using StateGraph
. This graph keeps track of the message history and tool results as the conversation progresses. Finally, we test the chatbot by running it on a few sample queries.
ChatOpenAI
) to interpret the user’s query. The model decides to use the tool node, which performs a Google Search (google_search
) using LangChain’s wrapper. After fetching results, control returns to the agent node again to interpret the tool response. Finally, the system reaches the final node, which generates the output. Bottom panel shows the results of the evals used at span level.